Search CORE

106 research outputs found

Leveraging text data for causal inference using electronic health records

Author: Celi Leo A.
Kaufman Aaron R.
Miratrix Luke
Mozer Reagan
Publication venue
Publication date: 09/06/2023
Field of study

Text is a ubiquitous component of medical data, containing valuable information about patient characteristics and care that are often missing from structured chart data. Despite this richness, it is rarely used in clinical research, owing partly to its complexity. Using a large database of patient records and treatment histories accompanied by extensive notes by attendant physicians and nurses, we show how text data can be used to support causal inference with electronic health data in all stages, from conception and design to analysis and interpretation, with minimal additional effort. We focus on studies using matching for causal inference. We augment a classic matching analysis by incorporating text in three ways: by using text to supplement a multiple imputation procedure, we improve the fidelity of imputed values to handle missing data; by incorporating text in the matching stage, we strengthen the plausibility of the matching procedure; and by conditioning on text, we can estimate easily interpretable text-based heterogeneous treatment effects that may be stronger than those found across categories of structured covariates. Using these techniques, we hope to expand the scope of secondary analysis of clinical data to domains where quantitative data is of poor quality or nonexistent, but where text is available, such as in developing countries

arXiv.org e-Print Archive

Recommended from our members

Customized Prediction of Short Length of Stay Following Elective Cardiac Surgery in Elderly Patients Using a Genetic Algorithm

Author: Celi Leo A.
Govindan Sapna
Khabbaz Kamal R.
Lee Joon
Subramaniam Balachundhar
Publication venue: 'Scientific Research Publishing, Inc.'
Publication date: 11/03/2014
Field of study

Objective: To develop a customized short LOS (<6 days) prediction model for geriatric patients receiving cardiac surgery, using local data and a computational feature selection algorithm. Design: Utilization of a machine learning algorithm in a prospectively collected STS database consisting of patients who received cardiac surgery between January 2002 and June 2011. Setting: Urban tertiary-care center. Participants: Geriatric patients aged 70 years or older at the time of cardiac surgery. Interventions None. Measurements and Main Results Predefined morbidity and mortality events were collected from the STS database. 23 clinically relevant predictors were investigated for short LOS prediction with a genetic algorithm (GenAlg) in 1426 patients. Due to the absence of an STS model for their particular surgery type, STS risk scores were unavailable for 771 patients. STS prediction achieved an AUC of 0.629 while the GenAlg achieved AUCs of 0.573 (in those with STS scores) and 0.691 (in those without STS scores). Among the patients with STS scores, the GenAlg features significantly associated with shorter LOS were absence of congestive heart failure (CHF) (OR = 0.59, p = 0.04), aortic valve procedure (OR = 1.54, p = 0.04), and shorter cross clamp time (OR = 0.99, p = 0.004). In those without STS prediction, short LOS was significantly correlated with younger age (OR = 0.93, p < 0.001), absence of CHF (OR = 0.53, p = 0.007), no preoperative use of beta blockers (OR = 0.66, p = 0.03), and shorter cross clamp time (OR = 0.99, p < 0.001). Conclusion: While the GenAlg-based models did not outperform STS prediction for patients with STS risk scores, our local-data-driven approach reliably predicted short LOS for cardiac surgery types that do not allow STS risk calculation. We advocate that each institution with sufficient observational data should build their own cardiac surgery risk models

Harvard University - DASH

Interrogating a clinical database to study treatment of hypotension in the critically ill

Author: Celi Leo A
Kothari Rishi
Ladapo Joseph A
Lee Joon
Scott Daniel J
Publication venue: BMJ Group
Publication date: 01/01/2012
Field of study

Crossref

PubMed Central

The PLOS ONE collection on machine learning in health and biomedicine: Towards open code and open data

Author: Celi Leo A
Citi Luca
Ghassemi Marzyeh
Pollard Tom J
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Recent years have seen a surge of studies in machine learning in health and biomedicine, driven by digitalization of healthcare environments and increasingly accessible computer systems for conducting analyses. Many of us believe that these developments will lead to significant improvements in patient care. Like many academic disciplines, however, progress is hampered by lack of code and data sharing. In bringing together this PLOS ONE collection on machine learning in health and biomedicine, we sought to focus on the importance of reproducibility, making it a requirement, as far as possible, for authors to share data and code alongside their papers

University of Essex Research Repository

DSpace@MIT

Directory of Open Access Journals

Datathons and Software to Promote Reproducible Research

Author: Celi Leo Anthony G.
Lokhandwala Sharukh
Montgomery Robert
Moses Christopher A
Naumann Tristan
Pollard Tom Joseph
Spitz Daniel
Stretch Robert
Publication venue: 'JMIR Publications Inc.'
Publication date: 01/08/2016
Field of study

Background: Datathons facilitate collaboration between clinicians, statisticians, and data scientists in order to answer important clinical questions. Previous datathons have resulted in numerous publications of interest to the critical care community and serve as a viable model for interdisciplinary collaboration. Objective: We report on an open-source software called Chatto that was created by members of our group, in the context of the second international Critical Care Datathon, held in September 2015. Methods: Datathon participants formed teams to discuss potential research questions and the methods required to address them. They were provided with the Chatto suite of tools to facilitate their teamwork. Each multidisciplinary team spent the next 2 days with clinicians working alongside data scientists to write code, extract and analyze data, and reformulate their queries in real time as needed. All projects were then presented on the last day of the datathon to a panel of judges that consisted of clinicians and scientists. Results: Use of Chatto was particularly effective in the datathon setting, enabling teams to reduce the time spent configuring their research environments to just a few minutes—a process that would normally take hours to days. Chatto continued to serve as a useful research tool after the conclusion of the datathon. Conclusions: This suite of tools fulfills two purposes: (1) facilitation of interdisciplinary teamwork through archiving and version control of datasets, analytical code, and team discussions, and (2) advancement of research reproducibility by functioning postpublication as an online environment in which independent investigators can rerun or modify analyses with relative ease. With the introduction of Chatto, we hope to solve a variety of challenges presented by collaborative data mining projects while improving research reproducibility

DSpace@MIT

PubMed Central

The challenges of combatting antimicrobial resistance in the Philippines

Author: Celi Leo Anthony G
Eala Michelle Ann B
Paguio Joseph A
Robredo Janine Patricia G
Salamat Maria Sonia S
Publication venue: Archīum Ateneo
Publication date: 01/01/2022
Field of study

archīum.ATENEO (Ateneo de Manila Univ.)

The association between the neutrophil-to-lymphocyte ratio and mortality in critical illness: an observational cohort study

Author: Celi Leo Anthony G.
Marshall Dominic C
Pimentel Marco A
Pollard Tom
Salciccioli Justin D
Santos Mauro D
Shalhoub Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Introduction The neutrophil-to-lymphocyte ratio (NLR) is a biological marker that has been shown to be associated with outcomes in patients with a number of different malignancies. The objective of this study was to assess the relationship between NLR and mortality in a population of adult critically ill patients. Methods We performed an observational cohort study of unselected intensive care unit (ICU) patients based on records in a large clinical database. We computed individual patient NLR and categorized patients by quartile of this ratio. The association of NLR quartiles and 28-day mortality was assessed using multivariable logistic regression. Secondary outcomes included mortality in the ICU, in-hospital mortality and 1-year mortality. An a priori subgroup analysis of patients with versus without sepsis was performed to assess any differences in the relationship between the NLR and outcomes in these cohorts. Results A total of 5,056 patients were included. Their 28-day mortality rate was 19%. The median age of the cohort was 65 years, and 47% were female. The median NLR for the entire cohort was 8.9 (interquartile range, 4.99 to 16.21). Following multivariable adjustments, there was a stepwise increase in mortality with increasing quartiles of NLR (first quartile: reference category; second quartile odds ratio (OR) = 1.32; 95% confidence interval (CI), 1.03 to 1.71; third quartile OR = 1.43; 95% CI, 1.12 to 1.83; 4th quartile OR = 1.71; 95% CI, 1.35 to 2.16). A similar stepwise relationship was identified in the subgroup of patients who presented without sepsis. The NLR was not associated with 28-day mortality in patients with sepsis. Increasing quartile of NLR was statistically significantly associated with secondary outcome. Conclusion The NLR is associated with outcomes in unselected critically ill patients. In patients with sepsis, there was no statistically significant relationship between NLR and mortality. Further investigation is required to increase understanding of the pathophysiology of this relationship and to validate these findings with data collected prospectively.National Institutes of Health (U.S.) (Grant R01 EB017205-01A1

DSpace@MIT

Springer - Publisher Connector

PubMed Central